Processing Interval Joins On Map-Reduce

نویسندگان

  • Bhupesh Chawda
  • Himanshu Gupta
  • Sumit Negi
  • Tanveer A. Faruquie
  • L. Venkata Subramaniam
  • Mukesh K. Mohania
چکیده

In this paper we investigate the problem of processing multiway interval joins on map-reduce platform. We look at join queries formed by interval predicates as defined by Allen’s interval algebra. These predicates can be classified in two groups: colocation based predicates and sequence based predicates. A colocation predicate requires two intervals to share at least one common point while a sequence predicate requires two intervals to be disjoint. An interval join query can therefore be thought of as belonging to one of the three classes: (a) queries containing only colocation based predicates, (b) queries containing only sequence based predicates and (c) queries containing both classes of predicates. We address these three classes of join queries, discuss the challenges and present novel approaches for processing these queries on map-reduce platform. We also discuss why the current approaches developed for handling join queries on real-valued data can not be directly used to handle interval joins. We finally extend the approaches developed to handle join queries containing multiple interval attributes as well as join queries containing both interval as well as non-interval attributes. Through experimental evaluations both on synthetic and real life datasets, we demonstrate that the proposed approaches comfortably outperform naive approaches.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cascading map-side joins over HBase for scalable join processing

One of the major challenges in large-scale data processing with MapReduce is the smart computation of joins. Since Semantic Web datasets published in RDF have increased rapidly over the last few years, scalable join techniques become an important issue for SPARQL query processing as well. In this paper, we introduce the Map-Side Index Nested Loop Join (MAPSIN join) which combines scalable index...

متن کامل

Implementation and Analysis of Join Algorithms to handle skew for the Hadoop Map/Reduce Framework

he Map/Reduce framework-a parallel processing paradigm-is widely being used for large scale distributed data processing. Map/Reduce can perform typical relational database operations like selection, aggregation, and projection etc. However, binary relational operators like join, cartesian product, and set operations are difficult to implement with Map/Reduce. Map/Reduce can process homogeneous ...

متن کامل

Runtime Optimization of Join Location in Parallel Data Management Systems

Applications running on parallel systems often need to join a streaming relation or a stored relation with data indexed in a parallel data storage system. Some applications also compute UDFs on the joined tuples. The join can be done at the data storage nodes, corresponding to reduce side joins, or by fetching data from the storage system to compute nodes, corresponding to map side join. Both m...

متن کامل

Interval Count Semi-Joins

Interval joins find applications in several domains, including temporal and spatial databases, uncertain data management, streaming data processing. In this paper, we study the evaluation of an interval count semi-join (ICS J ) operation that can be used for selecting or ranking intervals based on the number of join pairs they appear in. We extend the state-of-the-art algorithm for interval joi...

متن کامل

RDFChain: Chain Centric Storage for Scalable Join Processing of RDF Graphs using MapReduce and HBase

As a massive linked open data is available in RDF, the scalable storage and efficient retrieval using MapReduce have been actively studied. Most of previous researches focus on reducing the number of MapReduce jobs for processing join operations in SPARQL queries. However, the cost of shuffle phase still occurs due to their reduce-side joins. In this paper, we propose RDFChain which supports th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014